Skip to content

fix(runtime): use provider_opts.context_size for compaction#2814

Merged
dgageot merged 2 commits into
docker:mainfrom
dgageot:board/67262d46e6c8e609
May 18, 2026
Merged

fix(runtime): use provider_opts.context_size for compaction#2814
dgageot merged 2 commits into
docker:mainfrom
dgageot:board/67262d46e6c8e609

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented May 18, 2026

Summary

Fixes #2800.

When Docker Model Runner (DMR) is configured with a model that isn't catalogued in models.dev — typically a HuggingFace GGUF such as huggingface.co/unsloth/qwen3.5-4b-gguf:Q4_K_M — automatic compaction silently became a no-op:

  • compactionContextLimit returned 0, so the LLM strategy bailed.
  • The proactive 90% trigger in runStreamLoop never fired.
  • Post-overflow recovery surfaced as Failed to get model definition every time the assistant tried to respond, with no way out.

The user already supplies a context_size in provider_opts for DMR to size the inference context. The runtime now uses that same value as the authoritative context limit when set, falling back to the models.dev catalogue otherwise. This keeps planning aligned with what the engine actually enforces.

Resolution order

  1. provider_opts.context_size (when set and parseable as a positive integer)
  2. models.dev catalogue limit
  3. 0 — caller treats as "can't compact"

A single LocalRuntime.resolveContextLimit helper is the source of truth, used by:

  • compactionContextLimit (LLM compaction strategy)
  • runStreamLoop proactive 90% trigger
  • EmitStartupInfo sidebar context-percent on session restore
  • compactWithReason post-compaction TokenUsageEvent

So the sidebar, the proactive trigger, and the LLM compactor all plan against the same number.

Tests

  • 12-case helper matrix covering int / int64 / int32 / float64 / float32 / string-decimal / whitespace / non-numeric / negative / zero / bool / missing-key / nil-opts.
  • Nil-provider safety.
  • provider_opts.context_size takes precedence over the catalogue.
  • Falls back to the catalogue when context_size is unset.
  • Falls back to provider_opts.context_size when modelsStore.GetModel errors (the exact reported scenario).
  • Returns 0 when neither source yields a usable limit.

`task lint` — 0 issues. `task test` — full suite passes.

Closes #2800

dgageot added 2 commits May 18, 2026 11:52
Local models not catalogued in models.dev (e.g. DMR with HuggingFace
GGUFs) can now supply context_size via provider_opts to enable
compaction. When models.dev lookup fails, the runtime falls back to
this user-supplied limit, making compaction (proactive threshold and
post-overflow recovery) functional for uncatalogued models.

Fixes docker#2800
Self-review of the previous commit surfaced four issues:

  * compactIfNeeded carried an unused *modelsdev.Model parameter; drop it
    and let the call sites pass the resolved contextLimit only.
  * EmitStartupInfo and compactWithReason did their own catalogue-only
    lookup, so the sidebar's context-percent and the post-compaction
    TokenUsageEvent stayed inconsistent with the freshly-fixed compaction
    triggers in loop.go and session_compaction.go.
  * The provider_opts.context_size fallback was second-class. The user
    typed that number in their config, and DMR allocates exactly that
    much; treat it as authoritative when set, with the catalogue as
    fallback. This also makes the resolution monotonic across providers
    rather than depending on whether the catalogue has the model.
  * The dual implementation of priority order (catalogue-first in
    runStreamLoop, provider-first elsewhere) was a footgun.

Extract resolveContextLimit on LocalRuntime as the single source of
truth. compactionContextLimit, runStreamLoop, EmitStartupInfo and
compactWithReason now route through it, so the sidebar, the proactive
trigger and the LLM compactor all plan against the same number.
@dgageot dgageot requested a review from a team as a code owner May 18, 2026 10:05
@docker-agent
Copy link
Copy Markdown

PR Review Failed — The review agent encountered an error and could not complete the review. View logs.

@dgageot dgageot merged commit cf296d8 into docker:main May 18, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failed to get model definition wit /compact + Docker Model Runner

3 participants